Improved document image segmentation algorithm using multiresolution morphology
نویسندگان
چکیده
Page segmentation into text and non-text components is an essential preprocessing step before OCR operation. If this is not done properly, an OCR classification engine produces garbage text due to the presence of nontext components. This paper describes improvements to the text/image segmentation algorithm described by Bloomberg, which is also available in his open-source Leptonica library. The modifications result in significant improvements over Bloomberg’s algorithm on UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
منابع مشابه
Image Segmentation using Improved Imperialist Competitive Algorithm and a Simple Post-processing
Image segmentation is a fundamental step in many of image processing applications. In most cases the image’s pixels are clustered only based on the pixels’ intensity or color information and neither spatial nor neighborhood information of pixels is used in the clustering process. Considering the importance of including spatial information of pixels which improves the quality of image segmentati...
متن کاملObject Detection in Medical Images Based on Improved Morphological Multiresolution Decomposition and Morphological Segmentation
A semi-automatic object detection method based on mathematical morphology image processing techniques is presented. This paper does not present a complete methodology but rather an illustration of a potential application of mathematical morphology to medical images. The method based on mathematical morphology tools includes an improved multiresolution morphological decomposition algorithm (IMMD...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملRobust Potato Color Image Segmentation using Adaptive Fuzzy Inference System
Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...
متن کاملAn Improved Pixon-Based Approach for Image Segmentation
An improved pixon-based method is proposed in this paper for image segmentation. In thisapproach, a wavelet thresholding technique is initially applied on the image to reduce noise and toslightly smooth the image. This technique causes an image not to be oversegmented when the pixonbasedmethod is used. Indeed, the wavelet thresholding, as a pre-processing step, eliminates theunnecessary details...
متن کامل